93 research outputs found

    MEDock: a web server for efficient prediction of ligand binding sites based on a novel optimization algorithm

    Get PDF
    The prediction of ligand binding sites is an essential part of the drug discovery process. Knowing the location of binding sites greatly facilitates the search for hits, the lead optimization process, the design of site-directed mutagenesis experiments and the hunt for structural features that influence the selectivity of binding in order to minimize the drug's adverse effects. However, docking is still the rate-limiting step for such predictions; consequently, much more efficient algorithms are required. In this article, the design of the MEDock web server is described. The goal of this sever is to provide an efficient utility for predicting ligand binding sites. The MEDock web server incorporates a global search strategy that exploits the maximum entropy property of the Gaussian probability distribution in the context of information theory. As a result of the global search strategy, the optimization algorithm incorporated in MEDock is significantly superior when dealing with very rugged energy landscapes, which usually have insurmountable barriers. This article describes four different benchmark cases that span a diverse set of different types of ligand binding interactions. These benchmarks were compared with the use of the Lamarckian genetic algorithm (LGA), which is the major workhorse of the well-known AutoDock program. These results demonstrate that MEDock consistently converged to the correct binding modes with significantly smaller numbers of energy evaluations than the LGA required. When judged by a threshold of the number of energy evaluations consumed in the docking simulation, MEDock also greatly elevates the rate of accurate predictions for all benchmark cases. MEDock is available at and

    Crosstalk between transcription factors and microRNAs in human protein interaction network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene regulatory networks control the global gene expression and the dynamics of protein output in living cells. In multicellular organisms, transcription factors and microRNAs are the major families of gene regulators. Recent studies have suggested that these two kinds of regulators share similar regulatory logics and participate in cooperative activities in the gene regulatory network; however, their combinational regulatory effects and preferences on the protein interaction network remain unclear.</p> <p>Methods</p> <p>In this study, we constructed a global human gene regulatory network comprising both transcriptional and post-transcriptional regulatory relationships, and integrated the protein interactome into this network. We then screened the integrated network for four types of regulatory motifs: single-regulation, co-regulation, crosstalk, and independent, and investigated their topological properties in the protein interaction network.</p> <p>Results</p> <p>Among the four types of network motifs, the crosstalk was found to have the most enriched protein-protein interactions in their downstream regulatory targets. The topological properties of these motifs also revealed that they target crucial proteins in the protein interaction network and may serve important roles of biological functions.</p> <p>Conclusions</p> <p>Altogether, these results reveal the combinatorial regulatory patterns of transcription factors and microRNAs on the protein interactome, and provide further evidence to suggest the connection between gene regulatory network and protein interaction network.</p

    A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteins are dynamic macromolecules which may undergo conformational transitions upon changes in environment. As it has been observed in laboratories that protein flexibility is correlated to essential biological functions, scientists have been designing various types of predictors for identifying structurally flexible regions in proteins. In this respect, there are two major categories of predictors. One category of predictors attempts to identify conformationally flexible regions through analysis of protein tertiary structures. Another category of predictors works completely based on analysis of the polypeptide sequences. As the availability of protein tertiary structures is generally limited, the design of predictors that work completely based on sequence information is crucial for advances of molecular biology research.</p> <p>Results</p> <p>In this article, we propose a novel approach to design a sequence-based predictor for identifying conformationally ambivalent regions in proteins. The novelty in the design stems from incorporating two classifiers based on two distinctive supervised learning algorithms that provide complementary prediction powers. Experimental results show that the overall performance delivered by the hybrid predictor proposed in this article is superior to the performance delivered by the existing predictors. Furthermore, the case study presented in this article demonstrates that the proposed hybrid predictor is capable of providing the biologists with valuable clues about the functional sites in a protein chain. The proposed hybrid predictor provides the users with two optional modes, namely, the <it>high-sensitivity </it>mode and the <it>high-specificity </it>mode. The experimental results with an independent testing data set show that the proposed hybrid predictor is capable of delivering sensitivity of 0.710 and specificity of 0.608 under the <it>high-sensitivity </it>mode, while delivering sensitivity of 0.451 and specificity of 0.787 under the <it>high-specificity </it>mode.</p> <p>Conclusion</p> <p>Though experimental results show that the hybrid approach designed to exploit the complementary prediction powers of distinctive supervised learning algorithms works more effectively than conventional approaches, there exists a large room for further improvement with respect to the achieved performance. In this respect, it is of interest to investigate the effects of exploiting additional physiochemical properties that are related to conformational ambivalence. Furthermore, it is of interest to investigate the effects of incorporating lately-developed machine learning approaches, e.g. the random forest design and the multi-stage design. As conformational transition plays a key role in carrying out several essential types of biological functions, the design of more advanced predictors for identifying conformationally ambivalent regions in proteins deserves our continuous attention.</p

    Protemot: prediction of protein binding sites with automatically extracted geometrical templates

    Get PDF
    Geometrical analysis of protein tertiary substructures has been an effective approach employed to predict protein binding sites. This article presents the Protemot web server that carries out prediction of protein binding sites based on the structural templates automatically extracted from the crystal structures of protein–ligand complexes in the PDB (Protein Data Bank). The automatic extraction mechanism is essential for creating and maintaining a comprehensive template library that timely accommodates to the new release of PDB as the number of entries continues to grow rapidly. The design of Protemot is also distinctive by the mechanism employed to expedite the analysis process that matches the tertiary substructures on the contour of the query protein with the templates in the library. This expediting mechanism is essential for providing reasonable response time to the user as the number of entries in the template library continues to grow rapidly due to rapid growth of the number of entries in PDB. This article also reports the experiments conducted to evaluate the prediction power delivered by the Protemot web server. Experimental results show that Protemot can deliver a superior prediction power than a web server based on a manually curated template library with insufficient quantity of entries. Availability:

    Predicting microRNA precursors with a generalized Gaussian components based density estimation algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) are short non-coding RNA molecules, which play an important role in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, <it>ab initio </it>approaches have attracted more attention because they do not depend on homology information and provide broader applications than comparative approaches. Kernel based classifiers such as support vector machine (SVM) are extensively adopted in these <it>ab initio </it>approaches due to the prediction performance they achieved. On the other hand, logic based classifiers such as decision tree, of which the constructed model is interpretable, have attracted less attention.</p> <p>Results</p> <p>This article reports the design of a predictor of pre-miRNAs with a novel kernel based classifier named the generalized Gaussian density estimator (G<sup>2</sup>DE) based classifier. The G<sup>2</sup>DE is a kernel based algorithm designed to provide interpretability by utilizing a few but representative kernels for constructing the classification model. The performance of the proposed predictor has been evaluated with 692 human pre-miRNAs and has been compared with two kernel based and two logic based classifiers. The experimental results show that the proposed predictor is capable of achieving prediction performance comparable to those delivered by the prevailing kernel based classification algorithms, while providing the user with an overall picture of the distribution of the data set.</p> <p>Conclusion</p> <p>Software predictors that identify pre-miRNAs in genomic sequences have been exploited by biologists to facilitate molecular biology research in recent years. The G<sup>2</sup>DE employed in this study can deliver prediction accuracy comparable with the state-of-the-art kernel based machine learning algorithms. Furthermore, biologists can obtain valuable insights about the different characteristics of the sequences of pre-miRNAs with the models generated by the G<sup>2</sup>DE based predictor.</p

    HAHap: a read-based haplotyping method using hierarchical assembly

    Get PDF
    Background The need for read-based phasing arises with advances in sequencing technologies. The minimum error correction (MEC) approach is the primary trend to resolve haplotypes by reducing conflicts in a single nucleotide polymorphism-fragment matrix. However, it is frequently observed that the solution with the optimal MEC might not be the real haplotypes, due to the fact that MEC methods consider all positions together and sometimes the conflicts in noisy regions might mislead the selection of corrections. To tackle this problem, we present a hierarchical assembly-based method designed to progressively resolve local conflicts. Results This study presents HAHap, a new phasing algorithm based on hierarchical assembly. HAHap leverages high-confident variant pairs to build haplotypes progressively. The phasing results by HAHap on both real and simulated data, compared to other MEC-based methods, revealed better phasing error rates for constructing haplotypes using short reads from whole-genome sequencing. We compared the number of error corrections (ECs) on real data with other methods, and it reveals the ability of HAHap to predict haplotypes with a lower number of ECs. We also used simulated data to investigate the behavior of HAHap under different sequencing conditions, highlighting the applicability of HAHap in certain situations

    ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors

    Get PDF
    This article presents the design of a sequence-based predictor named ProteDNA for identifying the sequence-specific binding residues in a transcription factor (TF). Concerning protein–DNA interactions, there are two types of binding mechanisms involved, namely sequence-specific binding and nonspecific binding. Sequence-specific bindings occur between protein sidechains and nucleotide bases and correspond to sequence-specific recognition of genes. Therefore, sequence-specific bindings are essential for correct gene regulation. In this respect, ProteDNA is distinctive since it has been designed to identify sequence-specific binding residues. In order to accommodate users with different application needs, ProteDNA has been designed to operate under two modes, namely, the high-precision mode and the balanced mode. According to the experiments reported in this article, under the high-precision mode, ProteDNA has been able to deliver precision of 82.3%, specificity of 99.3%, sensitivity of 49.8% and accuracy of 96.5%. Meanwhile, under the balanced mode, ProteDNA has been able to deliver precision of 60.8%, specificity of 97.6%, sensitivity of 60.7% and accuracy of 95.4%. ProteDNA is available at the following websites
    corecore